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Abstract: Power series distributions form a useful subclass of one-parameter 
discrete exponential families suitable for modeling count data. A zero-inflated 
power series distribution is a mixture of a power series distribution and a 
degenerate distribution at zero, with a mixing probability p for the degenerate 
distribution. This distribution is useful for modeling count data that may have 
extra zeros. One question is whether the mixture model can be reduced to 
the power series portion, corresponding to p = 0, or whether there are so 
many zeros in the data that zero inflation relative to the pure power scries 
distribution must be included in the model i.e., p > 0. The problem is difficult 
partially because p = is a boundary point. 

Here, we present a Bayesian test for this problem based on recognizing 
that the parameter space can be expanded to allow p to be negative. Negative 
values of p are inconsistent with the interpretation of p as a mixing probabil- 
ity, however, they index distributions that are physically and probabilistically 
meaningful. We compare our Bayesian solution to two standard frequcntist 
testing procedures and find that using a posterior probability as a test statis- 
tic has slightly higher power on the most important ranges of the sample size 
n and parameter values than the score test and likelihood ratio test in simu- 
lations. Our method also performs well on three real data sets. 



1. Zero-inflated families 

Models for count data often fail to fit in practice because of the presence of more ze- 
ros in the data than is explained by a standard model. This situation is often called 
zero inflation because the number of zeros is inflated from the baseline number of 
zeros that would be expected in, say, a one-parameter discrete exponential family. 
Zero inflation is a special case of overdispersion that contradicts the relationship 
between the mean and variance in a one-parameter exponential family. One way 
to address this is to use a two-parameter distribution so that the extra parameter 
permits a larger variance. Efron [9] developed the notion of double exponential fam- 
ily, a two-parameter modification of a standard one-parameter exponential family, 
that allows a higher variance than permitted by the one-parameter version. This 
is reasonable in some examples, typical count data distributions, such as Poisson, 
cannot be used to model data containing extra zeros. 

Johnson, Kotz and Kemp ([13], pages 312-318) discuss a simple modification of 
a power series (PS) distribution f(-\6) to handle extra zeros. An extra proportion of 
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zeros, p, is added to the proportion of zeros from the original discrete distribution, 
while decreasing the remaining proportions in an appropriate way. So the zero- 
inflated PS distribution is defined as 



(1) f*(v\p,o) 



p + (l-p)/(O|0), if y = 0, 
(l-p)f(y\9), if y>0, 



where 6 6 6, the parameter space and the mixing parameter p ranges over the 
interval 

-/(O|0)/(l-/(O|0))<p<l. 

This allows the distribution to be well defined for certain negative values of p, 
depending on 9. Although the mixing interpretation is lost when p < 0, these values 
have a natural interpretation in terms of zero-deflation, relative to a PS model. 
Correspondingly, p > can be regarded as zero inflation relative to a PS model. 
Note that the PS family contains all discrete one-parameter exponential families so 
an appropriate choice of PS model in (1) permits any desired interpretation for the 
data corresponding to the second term. The first term allows an extra proportion p 
of zeros to be added to the discrete PS distribution; this data is effectively regarded 
as a sort of contamination. Note that, zero inflation (zero deflation, respectively) 
does not imply that model (1) has larger (smaller, respectively) variance than the 
non-inflated version. 

The first question to be asked is whether the degenerate distribution at zero is 
necessary. If it is not, then no zero inflation needs to be modeled and the model 
simplifies to f(y\6). Clearly, this is a hypothesis testing problem. If p is not allowed 
to be negative, p = is a boundary point and testing Ho ■ p = vs. Hi : p > 
is a notoriously difficult problem for both Bayesians and frequentists for which few 
results are available. (See Self and Liang [18] and Silvapullc and Silvapulle [19] for 
some asymptotics from a frequentist perspective.) Permitting negative values of p 
removes the boundary point problem so that the analytic challenges become man- 
ageable. The Bayes test obtained here compares favorably with standard frequentist 
methods in the real and simulated data cases we have examined. 

Familiar cases in which testing Ho : p = is useful include the zero-inflated 
Poisson (ZIP) distribution with parameters (p, 0) given by 

(2) f*^e)=pl {y=0} + {l-p)^-, y = 0,1,2,... 

in which 9 > 0, ^ e - « < p < 1 and E(Y\p, 9) = (1 — p)0 and the zero-inflated 
geometric distribution with parameters (p,9): 

(3) f*(y\p,9)=pl {y= o } + (l-p)(l-e)0y, y = 0,1, 2,... 

in which < 9 < 1, -i^ < p < 1, and E(Y\p,6) = (1 - p)9/(l - 9). The 
zero-inflated binomial is similar. 

These models have been examined from a frequentist standpoint. The earliest 
results on zero inflation can be found in Cochran [4] and Rao and Chakravarti [17]. 
In fitting a Poisson model to count data these authors checked whether lack of 
fit was due to the presence of extra zeros in the data by using an exact test and 
likelihood ratio test. Also in the context of a ZIP model, El-Shaarawi [10] obtained 
the ML estimator and used its asymptotic distribution to construct a confidence 
interval for the mean parameter. A peculiarity of the MLE for p is that it can give 
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negative values if there are no zeros in the data. Van Broek [1] derived the score 
test for the zero inflation parameter p for testing Ho : p = vs. Hi : p ^ 0. The 
two-sided alternative, however, gives up some power because the desired alternative 
is one-sided Hi : p > 0. A secondary problem is that the performance of this test 
deteriorates as the mean parameter increases. This may not be a serious problem 
because, as the mean increases, excess zeros will become more visually obvious since 
the Poisson model assigns ever less probability to zero. 

More generally, Deng and Paul [7] extended the score test to general one-parame- 
ter exponential family. Thus, motivated by industrial applications, they studied a 
regression model for the mean parameter of the exponential distribution. Later, 
Deng and Paul [8] treated ovcrdispcrsion and zero inflation simultaneously. In the 
ZIP context, Lambert [14] fitted a logistic regression model for p and a log- linear 
model for 9, using an EM algorithm to obtain estimates. Hall [12] extended this 
approach by adding random effects to the ZIP model and considered the case of a 
zero-inflated binomial model as well. 

From the Bayesian standpoint, Ghosh, Mukhopadhyay and Lu [11] estimated 
the parameters in a ZIP model in regression context as an alternative to tradition- 
ally used maximum likelihood based methods. Their simulation studies showed the 
Bayesian method had better finite sample performance than the classical method, 
giving tighter interval estimates and higher coverage probabilities. Our work can 
be regarded as a continuation of their work for hypothesis testing. 

In this paper, the main goal is to give a Bayes test of Ho : p = vs. Hi : p > 0. 
To this end we consider the posterior probability 



in which L(p, 9) is the likelihood function from a zero-inflated PS model and Y is 
a vector of n data points. The corresponding rejection region is T(Y) > c for some 
suitable c. Asymptotic choice of c is discussed in Section 3. Using (4) necessitates 
careful consideration of prior selection so that neither the null nor the alternative 
hypothesis will be unduly favored. This is done here by using Jeffreys' prior. 

Treating (4) as a frequentist test statistic, we derive some of its properties. In 
particular, we obtain higher order corrections for its asymptotic behavior. Then, 
we verify computationally that the power, a frequentist property, of the Bayes test 
for the ZIP family is roughly the same or a little higher than the power of the score 
test and the likelihood ratio test, for the hardest and most important ranges of n, 
p and 9 i.e., small to moderate p, small-ish 9, and small to moderate n. This is 
striking because Jeffreys' priors favor small #'s and p's near and 1, and so are 
relatively unfavorable to the null. From the estimation standpoint, we verify that 
the posterior density is well behaved and gives reasonable credible intervals. As a 
final verification, we apply our techniques to three real data sets computing Bayes 
factors and score tests for the presence of zero inflation and obtaining estimates for 
the zero inflation as appropriate. 

The structure of the paper is as follows. In Section 2 we provide some background 
on the properties of zero-inflated models from a Bayesian standpoint. In Section 3 
we present the Bayesian test and give some of its properties. In Section 4 we develop 
Bayes estimation. In Section 5 we give our comparisons and in Section 6 we use our 
method to analyze three data sets. 



(4) 



T(Y) = P{p > 0|Y) 



J e SZHp,8)n(p\6MO)dpd0 



L I' 1 /(■> o\ Lip, 9)TT(p\9)TT(8)dpd9 ' 

1-/(019) 
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2. Specifying the Bayes model 

Given a PS distribution it is easy to write down the zero-inflated model (1). Specifi- 
cation of a Bayes model also requires a prior distribution. In this section, we present 
some forms and properties of (1) along with the Fisher information matrix that will 
be required for finding objective priors. We start with the PS distribution case and 
then specialize. 



2.1. Zero-inflated power series distributions 



Let Y = (Yi, Y2, ■ ■ ■ , Y n ) be a random sample of size n, from f*(y\p, 0) defined in 
(1), where f{y\0) is given by 

(5) f(y\6) = ^, y = 0,1,2,..., 

in which g(9) = ^2 y a = QCiy0 y ^ s ^ ne normalizing constant. It is easy to verify that 
£(FiM) = (1 - P )0g'(0)/g{0). Writing 

n n 

n = ^2 T l Y i=o], S = ^2 Y i and ^ = S/n, 
i=i i=i 

the likelihood function based on Y is 

(1 _ \ n— no 

Using (6), it is an exercise to derive ML estimates for (p,0). From (6) it is easy 
to derive that the per unit Fisher information matrix l(p,0) ~ ((lij(p,0))) for 
i, j = 1,2 is given by 



\ 



1-/(O|0) 



(1-p) {p+(l-p)f(0\6)} 



p+(l- P )f(O\0) 



p+(l- P )f(O\0) 



(i-p) 



p+ 
1 



(1 



p)f(o\0) 
g'(0) 



J 



It is seen that the off-diagonal terms are nonzero. (In general, however, the off- 
diagonal terms are zero under the reparamctrization p* = p + (1 — p)f(O\0).) 

Two special cases of (5) recur regularly, the zero-inflated Poisson and geometric. 
The zero-inflated binomial is similar; we do not treat it explicitly here. 



2.1.1. Zero-inflated Poisson 

The ZIP distribution with parameters (p, 0) results from (1) by using the Poisson 
(0) probability mass function in place of f(y\0) as indicated in (2). Parallel to (6), 
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the likelihood function based on a sample of size n is given by 

(9) L(p, 9) = {p+(l- p)e- e } no {(1 - p)e- s } in - no) 6 s . 

Using (9), the MLE for (p,6) can be derived, see El-Shaarawi [10], as 



(10) 



7i = and pi 



no 
n 



n - n 1 - e-°i 

Likewise, the test statistic for the score test for Ho : p — can be derived as 

2 



(11a) 



Ts(Y) 



no 



-0o 



where 9 = Y is MLE under Ho, see Broek [1]. It can be shown that sgn(p) y/TjY) 
asymptotically follows a standard normal distribution under Ho '■ p = and a level 
a rejection region for testing Ho ■ P = vs. Hi : p > is given as 



(lib) 



sgn(p)s/TJY) > z a , 



where z a is the upper a cut-off point from the standard normal distribution. 

Similarly, the likelihood ratio test can be derived. We omit the details since, 
unlike the score test statistic, the likelihood ratio statistic does not have an ex- 
plicit expression. If we denote the likelihood ratio test statistic by TJ(Y), then 
[sgn(p)y / Ti(Y)] asymptotically follows iV(0, 1) under Ho ■ P = and a level a 
rejection region for testing Ho ■ P = vs. Hi : p > is given as 



(12) sgnOVTHY) > z a . 

For the ZIP family, the Fisher information matrix has entries 



I{P,0) 



1 



(1-p) {p+(l-p)e- s } 



p+ (1 -p)e~ e 
1 — p p(l — p)e" 



p+ (1 -p)e~ 



p+(l -p)e-° ) 



2.1.2. Zero-inflated geometric 

The zero-inflated geometric distribution with parameters (p, 9) results from (1) by 
using the geometric (8) probability mass function in place of f(y\0) as indicated in 
(3). Parallel to (6), the likelihood function based on a sample of size n is given by 



(13) 



Hp, 8) = {p+(l- p)(l - 9)} n " {(1 - 9)(l - p)} r 



Using (13) the MLE's for (p,9) can be derived; the test statistic for the score test 
for Ho is 



T.0O 



n(l + Y) 



^(l + Y)-l 

n 
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In this case, the Fisher information matrix has entries 

i(p,e) = 

( - 1 



(l-p){p+(l-p)(l-6)} p+{l-p)(l-9) 

+ {l-6) 2 1-p 



V P +(i- P )(i-8) v yj^ (l-8) 2 8 p + (1 -p)(l - 6) J / 

For testing 7io : p = vs. 7ii : p > 0, the score test and likchood ratio test can be 
expressed in terms of rejection regions similar to those given in (lib) and (12). 

2.2. Prior specification 

It is well known that Jeffreys' prior is the reference prior in the absence of nuisance 
parameters, see Clarke and Barron [3]. That is, Jeffreys' prior is objective in the 
sense that using Jeffreys' prior gives a posterior that updates the prior as much as 
possible on average in relative entropy. Informally, it permits maximal information 
gain in a data transmission sense. For a small number of parameters, here 2, this 
is a reasonable optimality criterion. 

By definition, Jeffreys' prior on (p, 9) is the square root of the determinant of 
the Fisher information matrix, 

(14) 7Tj(p,0)^(dct(I(p,9))) 1/2 . 

For a zero-inflated power series distribution, there is no convenient expression in 
general for det(/(p, 8)). However, for the ZIP model, Jeffreys' prior is 

(1 - e-° - 9e- ) 1 / 2 
(15a) nj(p,0)<x 1 



[9{p+ (1 -p) e - e }]W 



As is typical for reference priors, (15a) is improper: The integral over 9 € [0, oo) 
diverges. Likewise, in a zero-inflated geometric, Jeffreys' prior is 

(15b) irj(p,9)^ 



(i-e){p + (i- P )(i-9)}v*- 



Again, this is improper because the integral over 9 diverges. 

Jeffreys' prior, given by (14), would be appropriate if both p and 9 were of equal 
interest. Here, we are mainly interested in p. So, we used the Jeffreys' prior for p 
for given 9, that is 

(16) <j{p\e)^ [hi(p,e)}^ 2 , 

and used the Jeffreys' prior for 9 derived from the non-inflated model f(y\8). 
For a zero-inflated PS model, (16) gives 

(17) {1 - fm>1 ' 2 



T(i-i>) 1/2 b + (i-i>)/(0|f))] 1/2 ' 

If g(6) corresponds to a zero-inflated Poisson model, (17) gives 
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and if g(9) corresponds to a zero-inflated geometric distribution, (17) gives 



(18b) 



(l-p){p+(l-p)(l-0)} 



1/2 



Note that in both (18a, b) the range of p includes a range of negative values, de- 
pending on 9, for which the prior density is well defined. 
Parallel to (16), the Jeffreys' prior for 9 in the PS model is 



(19) 



IT J (9) (X 



9"(0) , g'W 
9(0) 6g{6) 



g'(Q)V 

9(0)1 



"I 1/2 



Expression (19) gives l/y/9 and 1/[(1 — 0)V@] for the Poisson and geometric model, 
respectively. These are improper. However, the posterior turns out to be proper 
because a finite number of data points suffice to make it so. If a proper joint 
objective prior is desired, Rissanen's prior, see Rissanen [16], can be adapted and 
gives similar results but is computationally more demanding. 



3. Test criterion based on posterior probability 

For the general case of a zero-inflated PS model, the posterior is formed by using 
(6) and (14). In the ZIP model, these expressions become (9) and (15a). Another 
reasonable choice would be (9) with (17), which becomes (18a), and n(9) = 1/V9. 

Given these choices, the Bayes test for zero inflation is based on the posterior 
probability that p > 0. Thus, consider the statistic 

L L] L(u,9)ir(u\9)ir(9)dud9 

20 T Y = Pip > Y = Je Jo v ' y 1 ; w . 

L f -/ cow L(u, 6)Tr(u\6)w(6)dud6 

JO J 1-/(0|8) 

The main point of this section is to derive an asymptotic test based on T by finding 
the asymptotic distribution of T(Y) under Ho : p = 0. It is reasonable to conclude 
that there is zero inflation when P(p > 0|Y) is close to one. Consequently, from a 
frequentist standpoint, the rejection region is given by T(Y) > c where c is chosen 
based on the given level of significance. 



3. 1 . Finite sample properties of the test statistic 

Note that (20) exploits the extended parameter space for p, namely, — /(0|6*)/(1 — 
/(O|0)) < p < 1 so that as 9 increases, the lower bound approaches from the left. 

Let P( p .#)(•) be the probability measure for a zero-inflated PS family. It can be 
verified that for large sample size, P( Pj #)(T(Y) > c) is increasing inp for fixed 9 and 
increasing in 9 for fixed p. That is, as zero inflation increases the probability that 
T is large (close to one) increases and that as the probability of large outcomes of 
Y increases the probability of zero inflation also increases. This means that a single 
occurrence of zero can appear to be zero inflation if 9 is large enough. 

One feature which makes T easy to use is that as a generality the joint poste- 
rior distribution for (p, 9) and the marginal posterior for p are typically unimodal. 
Indeed, 7r(p|y) is typically unimodal, even for small sample sizes. The posterior den- 
sities from the simulations reported in Section 5 and the data analysis in Section 6 
are all unimodal. 
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3.2. Asymptotic distribution of the test statistic 

For ease of exposition, let 77 = (771,772) = (p, 6) so that Yi,...,Y n are IID with 
density f*(-\rj) and T(Y) = P v (f]i > ?7io|Y) where 7710 is a fixed value. 

First, we sketch a proof that under 77 = (7710,7/20) = Vo, the frcqucntist distri- 
bution of T is asymptotically Uniform[0, 1], i.e., T = U + o p (l) as n — > 00. Then 
we derive an expression for the asymptotic behavior of the first two moments of 
T. Although these arguments are presented in the zero-inflated PS context, they 
appear to be more general. 

Start by writing 

n 

t(v) = (V^y^log riVilv) and V = (m,m) = argmax£(?7). 

1=1 

Letting Dj denote d/dr/j for j = 1, 2 define 

ay = DiD 3 £(rj) and a ijk = DiDjD k £{fj). 

Under consistency conditions for the MLE and expected local supremum conditions 
on /*(-|7y) on a neighborhood around a fixed value 770, 

aijk -> E no DiDjD k log/* (li| 770), a.e.,P Vo . 

The empirical Fisher information is I(ff) = (Jij(ff)) and it is seen that Iij(fj) = 
— (l/n)aij(fj). To ensure I{fj) is well defined, assume that it is positive definite on 
a set S* with P V (S*) = 1 + o(l/y/n). Now the inverse is i" _1 (»7) = (I ij (fj)); it is 
needed to define the quantities that will appear in the asymptotic expression for T. 
Set 

„,(« = -!«(« 

and denote Ttj(fj) = DjTr(fj). Finally, the quantities that appear in the asymptotic 
expression to order 0(1/ y/n) are the second degree Hermitc polynomial J2(t) = 
t 2 — 1, and two correction terms 



Ga(fj) = ^a l ifem l m J m fe (/ 11 (^)) 3/2 



and 



7T^77j Z Z 

using the convention that repeated indices indicate summation. To get the form of 
the result, let 



l 11 I Tl 

Note that under 771 = 7710, V is the same as W. 

At last, from (2.3.19) in Datta and Mukerjee [6], taking j3\ = j3% = we get 

P(m < '7io I Y) 

= P(v < w\Y) 

(21) = $H + n-^^w) {G 1 (n, 77) + GM J 2 (w)} + o^ 1 / 2 ), 
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where w is the observed value of W and </>(•) and $(•) are standard normal pdf 
and cdf, respectively. However, when 7710 is true W is asymptotically N(0, 1). So, 
by the inverse probability integral transform <&(VU) is Uniform[0, 1] and the 1/y/n 
terms ensure the required rate. Thus, since T is of the form 1 — P{i]\ < 7710 |Y), T 
is asymptotically Uniform [0, 1] as well. 

To derive expressions for the moments of T, write Gi(w, fj) = ri(rjo) + o p (l), and 
^3(v) = r(?7o) + o(l). Recognizing that J2 is just a polynomial, it can be seen that 
there is an H(j]o) so that (21) can be written as 

(22) P( m < TfcolY) = P(V < w\Y) = $H + n-Va^fffo) + o^n" 1 / 2 ), 

and the expectation with respect to P„ can be taken on both sides. Using the result 
from that and applying Step 3 from Datta and Mukerjee [6], page 19, gives an 
expression for the frequentist probability from the middle term in (22): 

(23) P no (V <w) = <5>(w)+n- 1 / 2 cp(w)H*(i l0 )+o p {n- 1 / 2 ), 

where H* is derived from H and the r\\ in the probability is 77 10 , the true value. 
Differentiating (23) it is possible to derive an approximation for the density of V, 

fv(w\m)- 

Finally, by using (21) and fw(w\r)o), it is possible to derive expressions for the 
first 2 moments of T, because they only depend on W. Doing so gives that they 
are 1/2 and 1/12, as expected from the limiting uniform. However, given the 
correction terms, it is possible to equate the expressions for the first 2 moments 
of P(i]i < »7io|Y) to the first two moments of a Beta(a,ft) and thereby derive 
expressions for a and ft. Obviously, the resulting a and ft must converge to 1, i.e., 
give the Uniform[0, 1] in the limit for large n, but for finite n this provides a more 
refined approximation. 

4. Credible intervals 

Although one can in principle find a Bayes estimate for p, under say squared error 
loss, and find its posterior variance, Bayes tests are based on posterior probabilities 
which in turn are based on the posterior density. These also lead to credible sets. 

There are two main types of credible sets. The first is analogous to confidence 
intervals: a/2 of the probability in the tails is clipped off and the upper and lower 
boundaries announced. The second is highest posterior density HPD, i.e., a set of 
the form R{iT a ) = {p : 7r(p|y) > 7r Q }, where ir a is the largest constant such that 
P(p G R(n a )\y) > 1 — a. For symmetric unimodal densities the two types of interval 
are equal, and here HPD sets are obtained from a variant on the procedure used to 
get a-clipped credible sets. The basic idea is that if the credible interval or HPD 
set for p contains the value p = 0, then we may conclude that there is not enough 
evidence of zero inflation in the data. 

Difficulties in the cases studied here arise because the posterior is not available 
in a convenient analytic form: The priors discussed in Section 2.2 do not yield 
tractable marginal posteriors for p by directly integrating 9 out of joint poste- 
rior. Consequently, we find a Markov chain Monte Carlo (MCMC) estimate of the 
marginal posterior distribution and use it to find the 1 — a credible and HPD sets. 
Thus, given a sample from the marginal posterior 7r(p|Y = y) it is easy to form 
a 1 — a credible interval by choosing the a/2 and 1 — a/2 sample quantiles. This 
can also be done using draws from the joint (p, 6) posterior density. The HPD set 
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can be found by using the draws to estimate 7r(p|Y = y) by, say, 7r(p|Y = y) and 
obtaining approximate HPD sets from tt. 

Suppose that 7r(p, 9\y) and 7r(p|y) are the joint and marginal posterior, respec- 
tively, so that 

/>oo 

ApW) = / *(p,0\y)M. 
Jo 

In the case of ZIP model (9), with the conditional Jeffreys' prior (18a) for p, and 
Jeffreys' prior tt(9) = 1/V& for 9, the joint posterior is based on 

(24) 7T(p, %) OC {p + (1 - py- 9 }^ 1 ' 2 (I - p)n-no-l/2 e -fl(n-no) fl .-l/2 < 

Using (24), the goal is to estimate 7r(p|y) from a joint sample of (p,6) drawn 
from 7r(p, (9|y). Let {(p«, 0W), i = 1, . . . , B} be an MCMC sample from 7r(p, 0|y) 
so that n(p\y) can be estimated at p = pV' by 

1 B 

(25) 7 r(p«|y) = -^ 7 r(pW,^)|y). 

i=l 

Since it is computationally difficult to draw samples (p^\9^) from (24), we use 
a reparametrized model by transforming p* = p + (1 — p)e~ e . Incidentally, note 
that the parameters p* and 8 result in an orthogonal reparamctcrization of the ZIP 
model. It can be checked that the Fisher information matrix is diagonal given by 

1 (1 - er e - 6e- ){l - p*) \ 

p*(l-p*Y 6{\~e- e f )' 

As a result of this reparamctcrization, the joint posterior for (p*, 6) can written as 
a product of their marginals. In fact this idea can be extended in general for all 
zero-inflated PS distributions. 

Therefore, using the above fact the joint posterior distribution can be written as 

(26) 7r (p*,0| y )oc(p*r- 1/2 (l-p*)"-"°- 1 / 2 f T -— 9 s - 1 ' 2 . 

From (26) it is seen that ip*\y) follows a Beta(no + l/2, n— no + 1/2), so it is easy 
to draw posterior samples of p* . To draw samples of 9, we use rejection sampling 
with a suitably chosen gamma distribution as envelope. In this way it is possible to 
generate a representative sample {(p*^ , 9^), i = 1, . . . , B} from the joint posterior 
and using the relationship between p* and p, we get {(p^ l \ 9^), i = 1, . . . , B} where 
pW = (p*W — e~ 0<I, )/(l — e~ 6{,) ). Subsequently, using (25), we get an estimate of 
the marginal posterior density 7r(p|y) at p = pW. 

A 100(1 — a)% Bayesian credible interval for p is simply (p( a /2)>P(i-a/2)), where 
P(h\ is the fc-th quantile of {p^ l \ i = 1, . . . , B}. To find the HPD interval, there are 
several methods and algorithms, see Chen and Shao [2], and the references therein. 
Here, using the HPD set from the posterior sample {p^\ i = 1, . . . ,£>}, we find 
{■k(p^ |y), i = 1, . . . , B} and set n a to be the lOOa-th percentile of 7r(pW |y). This 
is adequate because the estimated posteriors are unimodal. Once we get 7r Q , we 
solve 7r(p|y) = 7r a for cut-off values of p to find the lower and upper limits of the 
HPD interval. 



Hp*, 9) = diag 
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5. Performance comparison 



In this section the performance of the test statistic (20) for Hq : p = vs. Hi : p > 
in the ZIP family is compared to the performance of the score test and the likelihood 
ratio test in simulations. Recall that the Bayes test is formed from (9), (18a), and 
7t(6) = 1/V0. The score test is given by (11a) and we numerically obtain the 
likelihood ratio statistic. In the Table 1 below, we have computed the power of 
these three tests for several choices of n, p, 9 for Ho vs Hi for level a = 0.05; 
in Table 2 the power of the one-sided Bayes test is compared to the power of the 
two-sided score and LR tests as well, also at the a = 0.05 level. We can see that 
the Bayes test performs somewhat better than the two-sided score test and the 
two-sided likelihood ratio test. 

However, as in Section 5, the simulations for the Bayes test require MCMC 
because calculating P(p > 0| Y) under the ZIP model is not straightforward. Indeed, 
in general, it is not possible to provide a procedure that will work for any zero- 
inflated PS distribution. Nevertheless, for the ZIP model, (26), implies that (20) 



Table 1 

The entries are the powers for the Bayesian, one-sided score, and one-sided LR tests, with 
10,000 simulations. The asterisks indicate when the values where the Bayes test has highest 
power. They are clustered around small to moderate p, small 9 and small to moderate n 





P 




0.00 






0.10 






0.30 






0.40 




e 


n 


Score 


Bayes 


LR 


Score 


Bayes 


LR 


Score 


Bayes 


LR 


Score 


Bayes 


LR 


0.5 


20 


0.049 


0.045 


0.047 


0.065 


0.068 


0.064 


0.111 


0.105 


0.103 


0.144 


0.134 


0.118 




50 


0.046 


0.043 


0.042 


0.078 


0.076 


0.072 


0.180 


0.159 


0.154 


0.251 


0.212 


0.209 




100 


0.050 


0.047 


0.046 


0.096 


0.090 


0.081 


0.284 


0.262 


0.263 


0.376 


0.363 


0.345 


1.0 


20 


0.040 


0.049 


0.036 


0.083 


0.094* 


0.082 


0.232 


0.247* 


0.228 


0.318 


0.323* 


0.311 




50 


0.040 


0.049 


0.040 


0.123 


0.133* 


0.126 


0.433 


0.434* 


0.417 


0.585 


0.582 


0.566 




100 


0.045 


0.047 


0.048 


0.181 


0.182 


0.188 


0.670 


0.671 


0.680 


0.840 


0.841 


0.843 


1.5 


20 


0.042 


0.053 


0.040 


0.123 


0.143* 


0.116 


0.389 


0.420* 


0.387 


0.544 


0.564* 


0.537 




50 


0.040 


0.047 


0.043 


0.214 


0.225* 


0.212 


0.730 


0.747* 


0.739 


0.884 


0.895* 


0.888 




100 


0.045 


0.046 


0.046 


0.345 


0.311 


0.351 


0.951 


0.936 


0.953 


0.992 


0.991 


0.993 


2.0 


20 


0.046 


0.052 


0.035 


0.194 


0.213* 


0.175 


0.615 


0.649* 


0.600 


0.763 


0.801* 


0.758 




50 


0.053 


0.053 


0.045 


0.345 


0.363* 


0.346 


0.936 


0.93 


0.935 


0.988 


0.988 


0.986 




100 


0.044 


0.053 


0.042 


0.577 


0.484 


0.557 


0.998 


0.995 


0.998 


1.000 


1.000 


1.000 



Table 2 

The entries are the powers for the Bayesian test and the two-sided score and two-sided LR 
tests, with 10,000 simulations. Putting asterisks in this table gives the same pattern as in 

Table 1, but stronger 





P 




0.00 






0.10 






0.30 






0.40 




9 


n 


Score 


Bayes 


LR 


Score 


Bayes 


LR 


Score 


Bayes 


LR 


Score 


Bayes 


LR 


0.5 


20 


0.045 


0.045 


0.061 


0.043 


0.068 


0.052 


0.065 


0.105 


0.057 


0.087 


0.134 


0.066 




50 


0.046 


0.043 


0.050 


0.055 


0.076 


0.056 


0.122 


0.159 


0.106 


0.181 


0.212 


0.136 




100 


0.051 


0.047 


0.051 


0.066 


0.090 


0.058 


0.185 


0.262 


0.174 


0.277 


0.363 


0.248 


1.0 


20 


0.048 


0.049 


0.058 


0.057 


0.094 


0.062 


0.142 


0.247 


0.143 


0.203 


0.323 


0.198 




50 


0.049 


0.049 


0.051 


0.075 


0.133 


0.078 


0.303 


0.434 


0.296 


0.443 


0.582 


0.430 




100 


0.052 


0.047 


0.050 


0.117 


0.182 


0.115 


0.571 


0.671 


0.542 


0.767 


0.841 


0.739 


1.5 


20 


0.047 


0.053 


0.057 


0.081 


0.143 


0.074 


0.280 


0.420 


0.267 


0.411 


0.564 


0.409 




50 


0.051 


0.047 


0.051 


0.140 


0.225 


0.131 


0.618 


0.747 


0.612 


0.806 


0.895 


0.809 




100 


0.049 


0.046 


0.054 


0.244 


0.311 


0.236 


0.913 


0.936 


0.908 


0.983 


0.991 


0.982 


2.0 


20 


0.041 


0.052 


0.071 


0.128 


0.213 


0.113 


0.501 


0.649 


0.471 


0.670 


0.801 


0.644 




50 


0.049 


0.053 


0.057 


0.257 


0.363 


0.228 


0.890 


0.935 


0.880 


0.975 


0.988 


0.973 




100 


0.047 


0.053 


0.045 


0.451 


0.484 


0.440 


0.995 


0.995 


0.994 


1.000 


1.000 


1.000 



100 
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can be written as 



P(p > 0|Y) 



*\n-n a „-9(n-n ) ns-(n-n a ) 



ir*(p*,6)dp*d6 



(27) 



1 / \ n—no 

Io° Jo ( i_e-tf ) p* n °(l - p*) n - n °e- e ( n - n °')9 s - ( - n - n ^ir*(p* 7 9)dp*d9 
E3 [w(p*,0)7r*(p*,0)] 



E3 



where 7r* (p* , 0) is the joint prior of (p* , 0) , 

ff (p*, 0) = p* n °(l - p *)«-"O e -e(n-no) 6 / S -(n-no) ) 



/ zi \ 7 

and w(p*,0) = I[ p * >e -<>] (rrp^j 



So, we draw a random sample {(p*^\ 0^), i = 1, . . ., B} where p*^ ~ Beta(nQ- 
1, n — no + 1) and 0^ ~ gamma(n — no, s — (n — hq) + 1)) and calculate 



P{p > 0|Y) 



]7 r*(p*W,0W)(- 



In Tables 1 and 2, the test statistic T(Y) from (20) is compared to the cut-off point 
found from the asymptotic distribution of T(Y) under 7io, i-e., we use upper a 
point on uni/orm(0, 1) as described in Section 3.2. All the simulations are based 
on 10,000 replications with B = 10, 000 MCMC samples in each replication. 

We comment that for the case of a zero-inflated geometric distribution, the pro- 
cedure is a little easier: It is just a matter of drawing samples from two different 
Beta distributions with parameters based on the sample. So, it is easy to find an 
MCMC estimate of the test statistic. 

Table 1 shows that for the one-sided test, all 3 tests have roughly the same level 
when p = 0. In fairness, the level for the score and LR tests is a little lower leading 
to lower power against alternatives. However, looking at how the power of all three 
tests indicated rises as p increases, it is clear that for mid-sized p and smallish the 
Bayes test has noticeably higher power, especially for small n. In fact, a good test 
is most important on this range because it is hard to distinguish zero inflation from 
its absence when is small or moderate, p ranges from small to mid-sized values, 
and n is not large. 

Table 2 is similar to Table 1 but the score and LR tests are two-sided. It shows 
that the same properties hold, but a little more strongly. This may be attributed 
to the fact that the Bayes test uses an extended parameter space, allowing some 
mass to represent zero deflation. 



6. Data analysis 

To demonstrate the efficacy of our technique, we apply it to test for presence of 
zero inflation in three famous datasets. We also give comparative values from other 
techniques. In general, the results from the techniques corroborate each other so 
the fact they are based on different principles lends credence to the conclusions. 
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The first dataset that we look at is the Urinary Tract Infection (UTI) data used 
in Broek [1] who used a score test to detect zero inflation in a Poisson model. 
The data are collected from 98 HIV-infected men, attending the department of 
internal medicine at the Utrecht University hospital. The number of times they 
had a urinary tract infection was recorded as Y. The data are recorded in Table 3. 
Merely by looking at the data it is clear that zero inflation is present. 

Our method yields a Bayes factor for testing TCo : p ~ vs. Hi : p > of 
Bio = 223.13. The details of computation of Bayes factor will be reported elsewhere. 
This is strong evidence in favor of the alternative, which is no surprise. In fact, 
P(p > 0|y) = .999. The observed value of the score statistic is 15.34 giving a 
p-value 0.0001. 

The next data set we consider is the Terrorism data from Conigliani, Castro 
and O'Hagan [5]. Table 4 shows the data concerning the number of incidents of 
international terrorism per month (Y) in the United States between 1968 and 1974. 
It is not immediately clear if there is a zero-inflation in this data set. Conigliani, 
Castro and O'Hagan [5] find a Fractional Bayes factor for this data set of 0.0089; 
we find a Bayes factor of Biq = 0.28. In fact, P{p > 0|y) = 0.507, an indeterminate 
value. The observed value of the score statistic is 0.04 and a p- value 0.83. All three 
assessments agree that there is no evidence of zero inflation. 

The third data set we analyzed is the Cholera data first analyzed by McKendrick 
[15]. Table 5 shows the number of patients per household suffering from cholera 
in a village in India in 1920's. Again, looking at the data strongly suggests zero 
inflation. While the Bayes factor is Bio = 238090, very strong evidence for zero 
inflation, under our method, P(p > 0|y) = .9999. The observed value of the score 
statistic is 30.56, effectively giving a p- value of 0. Again, all three assessments agree 
for this example. 

Although tests are useful for quantifying degree of belief, they are not the same as 
looking at the posterior distributions directly. Figure 1 shows plots of the marginal 
posteriors for p resulting from applying the ZIP model to each of the three data 
sets. All three posteriors arc unimodal and appear roughly symmetric. The location 
of the mode, and the spread around it determine the most credible values of p. For 
the UTI and Cholera data the determination is clear: Substantial zero inflation is 
present. For the Terror data, the graph does not give a clear answer. The slight 
asymmetry makes it difficult to tell whether p = is reasonable. In fact, the test 
shows it is, but this would be open to question from merely looking at the diagram. 

Table 6 gives 95% Baycsian credible and HPD intervals for the three data sets 
under consideration. Also the marginal posterior distributions of p are given in 



Table 3 
UTI data 



Y 


12 




3 


Total 


Frequency 


81 9 7 




1 


98 




Table 4 










Terror data 








Y 


12 


3 


4 


Total 


Frequency 


38 26 8 


2 


1 


75 




Table 5 










Cholera data 








y 


12 


3 


4 


Total 


Frequency 


168 32 16 


6 


1 


223 
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Table 6 

Bayesian credible and HPD intervals 



Data 


Credible Interval 


HPD Interval 


Terror 


(-0.6735, 0.2945) 


(-0.5560, 0.3654) 


Cholera 


(0.4619, 0.7095) 


(0.4700, 0.7144) 


UTI 


(0.3433, 0.8240) 


(0.4271, 0.8561) 



Figure 1. From the intervals and the figures as well, it is evident that there is 
noticeable amount of zero inflation in Cholera data and UTI data because the 
interval of concentration of the posterior distributions does not include zero whereas 
for Terror data the posterior distribution of p is centered around zero and the 
interval contains zero, signifying the absence of zero inflation in the data. 

Finally, for the sake of completeness, Table 6 gives 0.95 credible intervals and 
HPD intervals calculated from the posteriors. It is seen that for the Cholera and 
UTI data that is not in the intervals. This is consistent with the presence of zero 
inflation. For the Terror data, is in the interval. The interval is so wide much of it 
includes negative values. So, it is not a surprise that zero inflation is not indicated 
by the test. Note that the credible and HPD sets arc close for the cholera data 
indicating symmetry, but for the other two data sets the difference in the intervals 
suggests some left skewing, more for Terror than for Cholera. 





-2 -1.5 -1 -0.5 5 1 

P 



Fig 1. Estimated posterior densities of p. 
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7. Conclusions 

Overall, this article gives a general Bayesian setup for testing for zero inflation 
in PS distributions that can be compared to existing likelihood based methods 
occurring in frcqucntist treatments. The basic idea is to extend the parameter 
space to include a small range of negative values for the weight on zero inflation. 
Thus, the null hypothesis TIq : p = becomes an interior point of the parameter 
space and a standard Bayesian approach is feasible. 

Our simulations suggest the Bayesian test has power as high as, or slightly higher 
than the likelihood based tests, even when objective priors that are somewhat 
unfavorable to the hypothesis Hq : p = are used to automate the procedure. 
Interval estimation for p proceeds similarly, using the extended parameter space. 

The technique of extending the parameter space applies generally to Bayes, and 
potentially to frcqucntist, testing for zero inflation with count data, but obviously 
can apply to many situations where two distributions are mixed and one wants to 
know whether one component can be set to zero. In fact, the asymptotics for this 
test require only generic regularity conditions; they do not rely on specific forms of 
the likelihood such as exponential families. A further test of the method, aside from 
applying it to more general mixtures, would be extending it to a class of regression 
problems by including covariates. 
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